Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial code improvements with respect to Clang 19 warnings #1516

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Explorer09
Copy link
Contributor

This branch contains my proposal of the code quality changes with respect to Clang 19 warnings. (Notably, -Wshorten-64-to-32 is enabled by default in Clang 19.) I would try to break these into small changes so that these commits may be cherry-picked into main early.

(This pull request is related to #1513)

@BenBE BenBE added enhancement Extension or improvement to existing feature code quality ♻️ Code quality enhancement build system 🔧 Affects the build system rather then the user experience labels Aug 12, 2024
CRT.c Outdated Show resolved Hide resolved
CPUMeter.c Outdated Show resolved Hide resolved
DynamicMeter.c Outdated Show resolved Hide resolved
DynamicMeter.c Outdated Show resolved Hide resolved
DynamicMeter.c Outdated Show resolved Hide resolved
@Explorer09 Explorer09 force-pushed the llvm19-compat branch 3 times, most recently from 032721c to 161c4ac Compare August 13, 2024 00:19
Comment on lines -45 to +51
size_t columns = HeaderLayout_getColumns(this->scr->header->headerLayout);
unsigned int columns = HeaderLayout_getColumns(this->scr->header->headerLayout);
MetersPanel** meterPanels = xMallocArray(columns, sizeof(MetersPanel*));
Settings* settings = this->host->settings;

for (size_t i = 0; i < columns; i++) {
for (unsigned int i = 0; i < columns; i++) {
char titleBuffer[32];
xSnprintf(titleBuffer, sizeof(titleBuffer), "Column %zu", i + 1);
xSnprintf(titleBuffer, sizeof(titleBuffer), "Column %u", i + 1);
Copy link
Member

@BenBE BenBE Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one is a step backwards. Stick with size_t.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't understand what you mean. What's the reason of keeping the size_t?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two-fold:

  1. It's a number of items (columns)
  2. unsigned int looks kinda unwieldy IMHO.
  3. For the format string %zu is much more portable than uint32_t which would require you to use %"PRIu32" instead of just %u.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenBE Well, I have a different view on this:

  1. Object sizes with no likelihood of overflowing an int type would not need size_t (that is, unsigned long) for storing.
  2. In x86-64, incrementing and comparing int types are slightly cheaper than comparing long or size_t (in terms of code size). When int types can suffice, using long would make the code one-byte-per-operation larger.
  3. Header column count is unlikely to exceed termical column count (COLS), and the latter is bound to signed int type already.
  4. And no, we don't need PRIu32 here. It's unsigned int, not uint32_t.

This case is slightly different from string lengths. My view is string lengths in size_t can serve as a defence in depth, where the user is possible to feed a 4 GiB large string in a config file. But if the object size would be bound by any other factor, then size_t might be a waste.

(By the way, I did read this article from EWONTFIX when I write my changes.)

I hope htop developers can reconsider.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal opinion following:

@BenBE Well, I have a different view on this:

  1. Object sizes with no likelihood of overflowing an int type would not need size_t (that is, unsigned long) for storing.

Nonetheless size_t is less about the number of entities itself, but that you are counting entities.

Otherwise you could argue that "this hashtable implementation only needs short for its size, because I never store more than 65535 elements in it".

I know, that example is hyperbole, but follows that same logic. When I ask for size_t it's really little about "how many", but the "what" (AKA semantics).

  1. In x86-64, incrementing and comparing int types are slightly cheaper than comparing long or size_t (in terms of code size). When int types can suffice, using long would make the code one-byte-per-operation larger.

That might be true for x86_64, but that's not the only platform we need to support. There's at least ARM, ARM64, MIPS, and probably several others, where htop is runing. Unless performance really IS a bottleneck for these functions (AFAICS it's not), priority should be on correctness, not code size or performance.

  1. Header column count is unlikely to exceed terminal column count (COLS), and the latter is bound to signed int type already.

See arguments from the article you linked.

That limit is implied by the current interface from ncurses, but a future version could just switch to size_t for its column and row counts (potentially breaking BC at that point); there are good reasons like huge screens where terminals COULD have that many columns/rows.

  1. And no, we don't need PRIu32 here. It's unsigned int, not uint32_t.

k, ACK on that. But that's only a minor note regarding the chosen type signature.

This case is slightly different from string lengths. My view is string lengths in size_t can serve as a defence in depth, where the user is possible to feed a 4 GiB large string in a config file. But if the object size would be bound by any other factor, then size_t might be a waste.

Allocations are typically backed by 4 KiB pages at minimum, with the usual memory manager implementation pre-allocating some memory to handle sudden spikes. Thus unless the savings are major or we have massive amounts of objects (where storing UTF-8 instead of UTF-16 saves several hundreds of megabytes (like with some browsers) there's usually little need to save every last byte. Furthermore, by storing smaller data types you sometimes may make the alignment of the data worse, thus actually reducing performance.

From my experience I can tell, that I've seen real-life cases of where uint16_t vs. uint32_t was actually harmful due to alignment and memory access patterns. For 32 bit values on a 64 bit architecture, similar considerations apply.

(By the way, I did read this article from EWONTFIX when I write my changes.)

FWIW, I personally side with the author.

I hope htop developers can reconsider.

@cgzones @fasterit @ginggs @natoscott : Your opinion on this matter of size_t vs unsigned int?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to stick with size_t here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seconded
/DLange

Copy link
Contributor Author

@Explorer09 Explorer09 Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenBE @natoscott @fasterit
This commit was to fix the implicit downcast warnings from HeaderLayout_getColumns() by Clang 19 (the messages was: "implicit conversion loses integer precision: 'size_t' (aka 'unsigned long') to 'unsigned int' [-Wshorten-64-to-32]"). HeaderLayout_getColumns() was an inline function that returns a size_t that, in certain caller codes, has to downcast to int because terminal column variables are in int type to follow curses API.

With this background, I would like some clarifications on which decision to make:
(A) Keep HeaderLayout_getColumns() return value in size_t. Use size_t for local variables that hold the return values of HeaderLayout_getColumns(), and add explicit downcasts when needed (e.g. when used in the context of terminal column widths).
(B) Let HeaderLayout_getColumns() return an unsigned int. Keep using int type for places like terminal columns. But when using in array iterators (like in this example code you guys are commented on), upcast to size_t.
(C) Let HeaderLayout_getColumns() return an unsigned int. Also use unsigned int for array iterators that would never cross the bound of HeaderLayout_getColumns(). This is my original proposal.

EDIT: Just a reminder that the (A) option is the least preferred to me as the downcasts can hide potential errors when the original "getColumns" values are out of bounds. Although assertions may be added to ensure the numbers are in bound, they still look like an overkill solution to me (making the problem more complicated than necessary).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A.

Add explicit casts to fix the "-Wshorten-64-to-32" warning (enabled by
default in Clang 19).

Also shorten the type width for some local variables (where "uint32_t"
is enough and no need for "unsigned long").

Signed-off-by: Kang-Che Sung <[email protected]>
Don't assume a meter's caption string length will fit the type of
'int'. Use 'size_t' for the string lengths instead. Even though an
overflow on the string length is unlikely, this makes the code more
robust.

Signed-off-by: Kang-Che Sung <[email protected]>
Change return type of HeaderLayout_getColumns() from 'size_t' to
'unsigned int', and use 'unsigned int' type for all Header column
iterators.

Using 'size_t' is unnecessary (you cannot have 2^32 columns anyway)
and would cause "-Wshorten-64-to-32" warnings (enabled by default in
Clang 19).
unsigned int minutes = totalSeconds / 60;
unsigned int seconds = totalSeconds % 60;
unsigned int milliseconds = microseconds / 1000;
unsigned int minutes = (unsigned int)totalSeconds / 60;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the cast here happen after the division?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary because of the (totalSeconds < 600) conditional. Although I argue this is more of a style issue...
(Compiler should catch that (unsigned int)(totalSeconds / 60) can be optimized to ((unsigned int)totalSeconds) / 60, no?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily, as (unsigned int)((1UL<<32) / 60) != (unsigned int)(1UL<<32) / 60.

Won't happen in this case based on the previous check, but the correctness should take priority here.

@@ -318,25 +325,25 @@ static void LEDMeterMode_draw(Meter* this, int x, int y, int w) {
y + 2;
attrset(CRT_colors[LED_COLOR]);
const char* caption = Meter_getCaption(this);
mvaddstr(yText, x, caption);
int xx = x + strlen(caption);
mvaddnstr(yText, x, caption, w);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a little beyond my knowledge. According to man pages, the n in mvaddnstr refers to number of "characters" to print. However, what I really want to limit is the number of terminal columns, not the number of Unicode characters (this distinction is needed if East Asian Wide characters are in the strings).

Perhaps a good way is to test how this function call would output, would it be "12" or just "":

/* "1234 fullwidth" */
addnstr("\uFF11\uFF12\uFF13\uFF14 fullwidth", 2);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about wcswidth(3)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I wish is a function that does an inverse of wcswidth(3). wcswidth() returns the terminal column width given a string and a number of (Unicode) characters limit. What I wished is retrieving the number of characters (or number of bytes) given a string and a column width limit. RichString_appendnWideColumns() might give the closest I need, but I want a function that operates on a const char* rather than a RichString object. Maybe I would write one from scratch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately that is easier said than actual done, but based on the fact that wcswidth returns >=0 you can probably build something like this by calling wcwidth(3) for each character in turn until you overshoot the limit, in which case you know the last character was too much (this accounts for zero-width-combiners and other special cases). The down-side is, that you will need to transform the const char* into wchar_t* first, thus having this functionality live with the other RichString functions might be reasonable.

NB: There's actually an issue that might benefit from this functionality in #462 IIRC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenBE Yes, I'm aware of PR #462. And perhaps PR #1231 can benefit, too, as I wish this function would work as a word wrapper that is also CJK-aware.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build system 🔧 Affects the build system rather then the user experience code quality ♻️ Code quality enhancement enhancement Extension or improvement to existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants